generalization-stability tradeoff
- North America > Canada > Ontario > Toronto (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
The Generalization-Stability Tradeoff In Neural Network Pruning
Pruning neural network parameters is often viewed as a means to compress models, but pruning has also been motivated by the desire to prevent overfitting. This motivation is particularly relevant given the perhaps surprising observation that a wide variety of pruning approaches increase test accuracy despite sometimes massive reductions in parameter counts. To better understand this phenomenon, we analyze the behavior of pruning over the course of training, finding that pruning's benefit to generalization increases with pruning's instability (defined as the drop in test accuracy immediately following pruning). We demonstrate that this generalization-stability tradeoff'' is present across a wide variety of pruning settings and propose a mechanism for its cause: pruning regularizes similarly to noise injection. Supporting this, we find less pruning stability leads to more model flatness and the benefits of pruning do not depend on permanent parameter removal. These results explain the compatibility of pruning-based generalization improvements and the high generalization recently observed in overparameterized networks.
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Review for NeurIPS paper: The Generalization-Stability Tradeoff In Neural Network Pruning
Weaknesses: My major concern is around the experimental settings, which are somewhat artificial in my opinion, and thus make me question the generality of their approach. In particular, I would like to see additional experiments around the following aspects. They don't use weight regularization and only show results using Adam. While I understand the reasoning for this choice and it is probably important in order to amplify the effect of their observation, I would appreciate additional experiments using standard training pipelines, including dropout, data augmentation, and weight regularization. For the same reason as above, it makes me question the general applicability of their observations.
Review for NeurIPS paper: The Generalization-Stability Tradeoff In Neural Network Pruning
The paper studies the effect of pruning on the generalization ability of neural networks. It introduces a notion of pruning instability (determines the closeness to the original function, or the drop in accuracy after pruning) and show that instability relates positively to generalization of neural networks. The paper is purely empirical and while the reviewers initially had some concerns regarding the choice of architectures, hyperparameters and datasets, some of these concerns were properly addressed in the rebuttal. Overall, the paper introduces an interesting view on pruning which is backed up to a large extent by their experimental results. The reviewers agree that some aspects could be improved and have made many suggestions. I recommend acceptance but I also strongly encourage the authors to revise the paper according to the reviews to maximize its potential impact.
The Generalization-Stability Tradeoff In Neural Network Pruning
Pruning neural network parameters is often viewed as a means to compress models, but pruning has also been motivated by the desire to prevent overfitting. This motivation is particularly relevant given the perhaps surprising observation that a wide variety of pruning approaches increase test accuracy despite sometimes massive reductions in parameter counts. To better understand this phenomenon, we analyze the behavior of pruning over the course of training, finding that pruning's benefit to generalization increases with pruning's instability (defined as the drop in test accuracy immediately following pruning). We demonstrate that this "generalization-stability tradeoff'' is present across a wide variety of pruning settings and propose a mechanism for its cause: pruning regularizes similarly to noise injection. Supporting this, we find less pruning stability leads to more model flatness and the benefits of pruning do not depend on permanent parameter removal.